Serving and consuming an HTTP multipart/mixed response in Python by felipecrv · Pull Request #33 · apache/arrow-experiments

felipecrv · 2024-08-29T00:10:44Z

The client parses the multipart response produced by server/server.py
by using the multipart message parser from the Python email module.

This module puts the entire message in memory and seems to spend a lot
of time looking for part delimiter and encoding/decoding the parts.

The overhead of multipart/mixed parsing is 85% on my machine and after
the ~1GB Arrow Stream message is fully in memory, it takes only 0.06%
of the total execution time to parse it.

$ python simple_client.py
-- 3731 bytes of JSON data:
[
  {'ticker': 'SGJ', 'description': 'Syhnffek Gacb Jdylqis'}
  {'ticker': 'EILD', 'description': 'Eicfef Iiafeutm Lydut Dbmgq'}
  {'ticker': 'QTO', 'description': 'Qclxkqjd Tkxan Odmac'}
  {'ticker': 'IHTS', 'description': 'Iowjy Hieuj Tvwecy Smxedh'}
  {'ticker': 'TGFJ', 'description': 'Tvztlhba Garebomj Fnwvwgf Jffldbg'}
  ...+55 entries...
]
-- 988931832 bytes of Arrow data:
Schema:
ticker: string
price: int64
volume: int64

Parsed 42000000 records in 6836 batch(es)
-- Text Message:
Hello Client,

6836 Arrow batch(es) were sent in 6.561 seconds through 6837 HTTP
response chunks. Average size of each chunk was 144644.13 bytes.

--
Sincerely,
The Server
-- End of Text Message --
13.645 seconds elapsed
11.833 seconds (86.72%) seconds parsing multipart/mixed response
0.011 seconds (0.08%) seconds parsing Arrow stream

Closes apache/arrow#40598

felipecrv · 2024-08-29T00:11:20Z

@ianmcook

ianmcook · 2024-08-29T13:09:44Z

Could you please add a small README.md file alongside server.py in the server subdir that briefly explains what the server does?

(similar to https://github.com/apache/arrow-experiments/blob/main/http/get_simple/python/server/README.md)

ianmcook · 2024-08-29T15:33:45Z

Can you use carets in the markdown footnotes (like this) so GitHub renders them as footnotes? Thanks

felipecrv · 2024-08-29T16:27:27Z

Can you use carets in the markdown footnotes (like this) so GitHub renders them as footnotes? Thanks

Done. I was going to lookup the syntax after seeing the bad results.

ianmcook · 2024-08-29T16:46:20Z

Thanks @felipecrv, this looks great! The only problem I see here is that the calls to feedparser.feed() in the client example are excruciatingly slow—but you've explained clearly that this is an incidental affect of using the Python email module. Maybe later (with lower priority) we can come back and develop a more performant example.

ianmcook · 2024-08-29T16:48:16Z

I will merge later today if there are no other comments

felipecrv · 2024-08-29T16:57:48Z

Thanks @felipecrv, this looks great! The only problem I see here is that the calls to feedparser.feed() in the client example are excruciatingly slow—but you've explained clearly that this is an incidental affect of using the Python email module. Maybe later (with lower priority) we can come back and develop a more performant example.

Yeah. It's the parsing logic. Passing the entire 1GB message blob to email.message_from_bytes() is even slower without accounting for the time it takes to build the buffer.

I called this simple_client.py because later we should include the streaming_client.py.

felipecrv added 2 commits August 27, 2024 22:29

http/python: Rewrite README section about chunking

e60d2c5

get_multipart/python: Add server.py and simple_client.py

3621161

felipecrv commented Aug 29, 2024

View reviewed changes

Comment thread http/get_simple/python/server/README.md

felipecrv mentioned this pull request Aug 29, 2024

[Python] Create Python examples of HTTP GET Arrow client/server supporting multipart/mixed response apache/arrow#40598

Closed

ianmcook reviewed Aug 29, 2024

View reviewed changes

Comment thread http/get_multipart/python/client/simple_client.py Outdated

ianmcook reviewed Aug 29, 2024

View reviewed changes

Comment thread http/get_multipart/python/server/server.py Outdated

felipecrv added 5 commits August 29, 2024 11:28

get_multipart/python: Explain what urlsafe characters are

418ec25

get_multipart/python: Add two new READMEs

af2acd9

get_multipart/python: Move module-level docs to README

64e4b52

fixup! get_multipart/python: Add two new READMEs

2efad06

Add a general boundary generation algorithm recommendation

71c322d

felipecrv requested a review from ianmcook August 29, 2024 15:11

ianmcook reviewed Aug 29, 2024

View reviewed changes

Comment thread http/get_multipart/python/client/simple_client.py Outdated

ianmcook reviewed Aug 29, 2024

View reviewed changes

Comment thread http/get_multipart/python/client/README.md Outdated

felipecrv added 3 commits August 29, 2024 13:23

Always specify policy

f8d960d

Use the right md syntax for footnotes

6dae231

Change note to warning

985684d

felipecrv added 2 commits August 29, 2024 13:31

Fix positioning of footnote links

cda31f7

fixup! Fix positioning of footnote links

1d82be8

felipecrv requested a review from ianmcook August 29, 2024 16:36

ianmcook approved these changes Aug 29, 2024

View reviewed changes

felipecrv mentioned this pull request Aug 29, 2024

[Python] Create Python examples of indirect HTTP GET Arrow client and server apache/arrow#40596

Closed

ianmcook merged commit 5fb6547 into apache:main Aug 29, 2024

felipecrv deleted the multipart branch August 30, 2024 14:07

ianmcook mentioned this pull request Jan 24, 2025

[Python] Create Python examples of HTTP POST Arrow client/server supporting multipart/form-data request apache/arrow#40599

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Serving and consuming an HTTP multipart/mixed response in Python#33

Serving and consuming an HTTP multipart/mixed response in Python#33
ianmcook merged 12 commits into
apache:mainfrom
felipecrv:multipart

felipecrv commented Aug 29, 2024 •

edited by ianmcook

Loading

Uh oh!

felipecrv commented Aug 29, 2024

Uh oh!

Uh oh!

Uh oh!

ianmcook commented Aug 29, 2024

Uh oh!

Uh oh!

ianmcook commented Aug 29, 2024

Uh oh!

Uh oh!

Uh oh!

felipecrv commented Aug 29, 2024

Uh oh!

ianmcook commented Aug 29, 2024

Uh oh!

ianmcook commented Aug 29, 2024

Uh oh!

felipecrv commented Aug 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

felipecrv commented Aug 29, 2024 • edited by ianmcook Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felipecrv commented Aug 29, 2024

Uh oh!

Uh oh!

Uh oh!

ianmcook commented Aug 29, 2024

Uh oh!

Uh oh!

ianmcook commented Aug 29, 2024

Uh oh!

Uh oh!

Uh oh!

felipecrv commented Aug 29, 2024

Uh oh!

ianmcook commented Aug 29, 2024

Uh oh!

ianmcook commented Aug 29, 2024

Uh oh!

felipecrv commented Aug 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

felipecrv commented Aug 29, 2024 •

edited by ianmcook

Loading